Structure-Aware Residual Pyramid Network for Monocular Depth Estimation

Xiaotian Chen      Xuejin Chen*      Zheng-Jun Zha     

National Engineering Laboratory for Brain-inspired Intelligence Technology and Application

University of Science and Technology of China

International Joint Conferences on Artificial Intelligence 2019

Overview

Figure 1: The network architecture. Our Structure-Aware Residual Pyramid Network consists of an encoder which extracts multi-scale visual features, a Residual Pyramid Decoder (RPD) which progressively infers depth maps in a coarse-to-fine manner, and an Adaptive Dense Features Fusion (ADFF) module for dense feature fusion. The residual pyramid effectively adds structure details in each level based on the scene layout predicted at a coarser level..

 

Abstract:

Monocular depth estimation is an essential task for scene understanding. The underlying structure of objects and stuff in a complex scene is critical to recovering accurate and visually-pleasing depth maps. Global structure conveys scene layouts, while local structure reflects shape details. Recently developed approaches based on convolutional neural networks (CNNs) significantly improve the performance of depth estimation. However, few of them take into account multi-scale structures in complex scenes. In this paper, we propose a Structure-Aware Residual Pyramid Network (SARPN) to exploit multi-scale structures for accurate depth prediction. We propose a Residual Pyramid Decoder (RPD) which expresses global scene structure in upper levels to represent layouts, and local structure in lower levels to present shape details. At each level, we propose Residual Refinement Modules (RRM) that predict residual maps to progressively add finer structures on the coarser structure predicted at the upper level. In order to fully exploit multi-scale image features, an Adaptive Dense Feature Fusion (ADFF) module, which adaptively fuses effective features from all scales for inferring structures of each scale, is introduced. Experiment results on the challenging NYU-Depth v2 dataset demonstrate that our proposed approach achieves state-of-the-art performance in both qualitative and quantitative evaluation.

 

Results:

AutomaticalResult
Figure 2: Qualitative results on the NYUD2 dataset.
 
OptimizationResult
Figure 3: Comparison with [Jiao et al., 2018]. The depth maps predicted by our method preserve much more accurate depth around object boundaries and keeps finer structures, as highlighted in the boxes.
 
LayoutOptimizationResult
Figure 4: 3D projection from predicted depth maps. Our method better preserves the scene structure of various scales, especially the flat shape of large planar regions.
 
LayoutOptimizationResult
Figure 5: Two examples of layout modification from 3D model editing. For each example, a box-like carton can be generated from the given 2D layout. Users can edit the shape of the 3D carton by dragging edges in our system. By enforcing the shape rigidity constraints, our system automatically updates the 2D layout.
 
Acknowledgements:

This work was supported by the National Key Research & Development Plan of China under Grant 2018YFC0307905, the National Natural Science Foundation of China (NSFC) under Grants 61632006, 61622211, and 61620106009, the Priority Research Program of Chinese Academy of Sciences under Grant XDB06040900, as well as the Fundamental Research Funds for the Central Universities under Grant WK3490000003 and WK2100100030.

 
Main References:

[1] Junjie Hu, Mete Ozay, Yan Zhang, and Takayuki Okatani. Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. In IEEE Winter Conference on Applications of Computer Vision, 2019.

[2] Jianbo Jiao, Ying Cao, Yibing Song, and Rynson Lau. Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In European Conference on Computer Vision, pages 5369, 2018.

[3] Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. Deeper depth prediction with fully convolutional residual networks. In 3DV, pages 239–248. IEEE, 2016.

 
BibTex:
@inproceedings{Chen2019SARPN,
author = {Chen, Xiaotian and Chen , Xuejin and Zha, Zheng-Jun},
title = {Structure-Aware Residual Pyramid Network for Monocular Depth Estimation},
booktitle={International Joint Conferences on Artificial Intelligence},
year = {2019}
}
 
Downloads:
Disclaimer: The paper listed on this page is copyright-protected. By clicking on the paper link below, you confirm that you or your institution have the right to access the corresponding pdf file.

Copyright © 2018 GCL , USTC